y
p
g
pp
,
a data set into sub-data sets, each of which is a cluster. When it
ed to investigate how genes function within a genome under a
uster analysis can be used to partition genes into groups according
measured responsive strengths so as to discover how the genes
in different ways. Because of this reason, cluster analysis has
y popular in many biological/medical pattern discovery tasks [Le
al., 2003; Liu, et al., 2005; Huang and Pan, 2006; Pan, 2006;
and Saito, 2013]. Cluster analysis has been used to examine how
endent conformation changes [Chaturvedi, et al., 2020]. In a lung
udy, the use of cluster analysis has discovered that malignant
ioma tumours demonstrate high ROR1 expression [Miyake, et al.,
o examine whether bovine granulocytes is a symptom of
n of liver functions, cluster analysis has been used to confirm the
hip of gene expression between bovine granulocytes and
n of liver functions [Kizaki, et al., 2020]. The molecules which
milar biological functions in similar biological pathways are
to show similar activities in an experiment. A cluster analysis
s revealed different numbers of differentially expressed genes
ng to the same drought stress in wheat, maize and rice [Wei and
18].
of these researches are hypothesis-driven and have little or no a
owledge about how a cluster model should look like in advance.
words, no knowledge will tell which group of molecules should
ate a similar response to a stress before a pattern analysis process
machine learning algorithm starts. Cluster analysis is thus a good
to be used for this kind of pattern analysis.
uster data, the first issue is how to measure the distances between
ts. Different clustering algorithms may use different metrics for
g the distances or the similarities between data points for
types of data. Among them, the Euclidean distance, the Hamming